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8.2.3 Maximum Likelihood Estimation 

So far, we have discussed estimating the mean and variance of a distribution. Our methods have been 
somewhat ad hoc. More specifically, it is not clear how we can estimate other parameters. We now 
would like to talk about a systematic way of parameter estimation. Specifically, we would like to 
introduce an estimation method, called maximum likelihood estimation (MLE). To give you the idea 
behind MLE let us look at an example. 

Example 8.7 

I have a bag that contains 3 balls. Each ball is either red or blue, but I have no information in addition 
to this. Thus, the number of blue balls, call it 8 , might be 0, 1, 2, or 3.1 am allowed to choose 4 balls 
at random from the bag with replacement. We define the random variables X \, X 2 , X 3 , and X 4 as 
follows 


{ 1 if the z'th chosen ball is blue 

0 if the z'th chosen ball is red 

e 

Note thatX-'s are i.i.d. andX- ~ Bernoulli ( 3 ). After doing my experiment, I observe the following 
values forX-'s. 

X| = l,x 1 = 0, x 3 = 1,jc 4 = 1. 

Thus, I observe 3 blue balls and 1 red balls. 

1. For each possible value of 6, find the probability of the observed sample, 

(x v x 2 ,x 3 ,x 4 ) = (1, 0, 1, 1). 

2. For which value of 8 is the probability of the observed sample is the largest? 


• Solution 

o Since X- 


e 

Bernoulli (j), we have 


Pxjx) ~ 



forx = 1 


forx = 0 
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Since X-s are independent, the joint PMF of Xj , X 7 , X 3 , and X 4 can be written as 
P X l X 2 X 3 X 4 ( x l’ x 2’ x 3’ x d = P X ] ( X 1 ) P X^ x 2) P X^ x ^ P X^ x a) 


Therefore, 


p x l x 2 x 2 x^’ ^ ~~ 3 ' I 1 il'i'i 


e\ e e 

3 13 3 




Note that the joint PMF depends on 8, so we write it as Px l X 2 X X 4 ( x l’ x 2* x X x X ^)- We 
obtain the values given in Table 8.1 for the probability of (1,0, 1, 1). 


8 

P x ] x 2 x 3 x( ] ’ °, i, i; 0) 

0 

0 

1 

0.0247 

2 

0.0988 

3 

0 


Table 8.1: Values of P x x ^ x x (1, 0, 1, 1; 8) for Example 8.1 


The probability of observed sample for 8 = 0 and 8 = 3 is zero. This makes sense because 
our sample included both red and blue balls. From the table we see that the probability of 
the observed data is maximized for 8 = 2. This means that the observed data is most 
likely to occur for 8 = 2. For this reason, we may choose 8 = 2 as our estimate of 8. This 
is called the maximum likelihood estimate (MLE) of 8. 


The above example gives us the idea behind the maximum likelihood estimation. Flere, we introduce 
this method formally. To do so, we first define the likelihood function. Let V,, X 2 , X 3 , . . ., X n be a 

random sample from a distribution with a parameter 8 (In general, 8 might be a vector, 

8 = (0j, 8 2 , •••, 8/J.) Suppose thatxj,x 7 , jc 3 , ... ,x n are the observed values ... ,X n . If 

X-s are discrete random variables, we define the likelihood function as the probability of the observed 
sample sample as a function of 8: 
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L(x x ,x 2 , —,x n \ 9) = P(X x = x x ,X 2 = x 2 , ■■■,X n = x n ; 9) 

= P X { X 2 -x( x \’ X 2’ x n> ^)- 

To get a more compact formula, we may use the vector notation, X = (X x ,X 2 , Thus, we may 

write 


L(x; 9) = P x (x; 9). 

If X x , X 2 , X 3 , . . . ,X n are jointly continuous, we use the joint PDF instead of the joint PMF. Thus, the 
likelihood is defined by 

L(x x ,x 2 ,->,x n ; 9) =f xx X (x x ,x 2 ,-,x n ;9). 

1 z n 

Let X x , X 2 , X 3 , ... ,X be a random sample from a distribution with a parameter 6 . Suppose that we 
have observed X x = x x ,X 2 = x 2 , X n = x n . 

1. IfX's are discrete, then the likelihood function is defined as 

L{x x ,x 2 , -,x n ; 6) = P x x x (x x ,x 2 , -,x n ; 6). 

2. IfX-'s are jointly continuous, then the likelihood function is defined as 

L(x x ,x 2 ,-,x n ; 6) =f x ,x 7 -x( x l’ x 2 ’-’ x n 5 

1 z n 

In some problems, it is easier to work with the log likelihood function given by 

lnL(x 1 ,x 2 , ■■■,x ll ; 9). 


Example 8.8 

For the following random samples, find the likelihood function: 

1. X- ~ Binomial^ 3, 9), and we have observed (x l5 x 2 , x 3 , x 4 ) = (1, 3, 2, 2). 

2. X ; - ~ Exponential(9) and we have observed (x 1 ,x 7 ,x 3 ,x 4 ) = (1.23, 3.32, 1.98, 2.12). 

• Solution 

o Remember that when we have a random sample, X-'s are i.i.d., so we can obtain the joint 
PMF and PDF by multiplying the marginal (individual) PMFs and PDFs. 

1. If X- ~ Binomial^ 3, 9), then 


Thus, 


P x (x; 9) 



-x 


https://www.probabilitycourse.com/chapter8/8_2_3_max_likelihood_estimation.php 


3/10 



9/18/2018 


Maximum Likelihood Estimation 


L( x \,x 2 ,x 3 , x 4 ; 9) - Px^jXtX^x l’ x 2 , x 3’ x 4’ 

= ^X^ x 2 , ^)^X 3 ( x 3’ 0)Px 4 ( x 4', 0) 

/3w3w3w3\ 

_/ jj || II |(9 x 1 +x 2 + x 3 +x 4(1 — (9^)12“ ( x i+ x 2 + x 3 +x 4) 

\X] / \X 7 / \x 3 / \x 4 / 

Since we have observed (x 1 ,x 2 ,x 3 ,x 4 ) = (1, 3, 2, 2), we have 

= 27 0 8 (1 - 0) 4 . 

2. If X i ~ Exponential^), then 

f x (x; 6) = 9e~ ex u{x), 

where u(x) is the unit step function, i.e., u{x) = 1 forx > 0 and u(x) = 0 forx < 0. 
Thus, for x ( > 0, we can write 

T(x 1? x 2 ,x 3 ,x 4 ; 9) = fx ] xpc 3 x^ x \,x 2 ,x 3 , x 4 ; 9) 

= fx(x i; 8)f x (x 2 ; 9)f x (x 3 ; 9)f x (x 4 , 9) 

= 9 4 e~ ( X 1 +x 2 + x 3 +x 4 ) <9 

Since we have observed (x ] ,x 2 ,x 3 ,x 4 ) = (1.23, 3.32, 1.98,2.12), we have 
Z(1.23, 3.32, 1.98, 2.12; 9) = 9 4 e~ S65d . 


Now that we have defined the likelihood function, we are ready to define maximum likelihood 
estimation. Let X^, X 2 , X 3 , ..., X fj be a random sample from a distribution with a parameter 9. 

Suppose that we have observed A", = Xj, X 2 = x 2 , •••, X n = x ir The maximum likelihood estimate of 9, 

shown by 9 ML is the value that maximizes the likelihood function 


Z(x l5 x 2 , -,x„; 9). 

Figure 8.1 illustrates finding the maximum likelihood estimate as the maximizing value of 9 for the 
likelihood function. There are two cases shown in the figure: In the first graph, 9 is a discrete-valued 
parameter, such as the one in Example 8.7 . In the second one, 9 is a continuous-valued parameter, 
such as the ones in Example 8.8. In both cases, the maximum likelihood estimate of 9 is the value that 
maximizes the likelihood function. 
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L(x u x 2 ,...,x n ;6) 


L(xx,x 2 , ...,x n ;6) 


LLLL-J-jt 



e 


e 


Figure 8.1 - The maximum likelihood estimate for 9. 

Let us find the maximum likelihood estimates for the observations of Example 8.8. 

Example 8.9 

For the following random samples, find the maximum likelihood estimate of 9: 

1. X ~ Binomial^ 3, 9), and we have observed (jCj, x 7 , x 3 , x 4 ) = (1, 3, 2, 2). 

2. X ; - ~ Exponential(9) and we have observed (x 1 ,x 7 ,x 3 ,x 4 ) = (1.23, 3.32, 1.98, 2.12). 

• Solution 

o 1. In Example 8.8.. we found the likelihood function as 


Z(l, 3, 2, 2; 9) = 27 6> 8 ( 1 - 6») * 1 2 * 4 * * . 


To find the value of 9 that maximizes the likelihood function, we can take the 
derivative and set it to zero. We have 


dL{\, 3,2,2; 9) 
d9 


27[ 86» 7 (1 -9) 4 -49\l -0) 3 ]. 


Thus, we obtain 



2. In Example 8.8.. we found the likelihood function as 


1(1.23,3.32, 1.98, 2.12; 6») = 9*e 


,4-8.65(9 


Flere, it is easier to work with the log likelihood function, 
lnL(1.23, 3.32, 1.98, 2.12; 9). Specifically, 


lnZ( 1.23, 3.32, 1.98, 2.12; 9) = 41n6» - 8.656». 


By differentiating, we obtain 


4 


- - 8.65 = 0, 


which results in 
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6 ml = 0.46 

It is worth noting that technically, we need to look at the second derivatives and endpoints 
to make sure that the values that we obtained above are the maximizing values. For this 
example, it turns out that the obtained values are indeed the maximizing values. 


Note that the value of the maximum likelihood estimate is a function of the observed data. Thus, as 
any other estimator, the maximum likelihood estimator (MLE), shown by © M l is indeed a random 

variable. The MLE estimates 6 M l that we found above were the values of the random variable © M l 
for the specified observed d 

The Maximum Likelihood Estimator ( MLE ) 

Let X|, X 2 , X 3 , . . . , X n be a random sample from a distribution with a parameter 9. Given that we 
have observed A", = x 1? X 2 = x 2 , X n = x n , a maximum likelihood estimate of 9, shown by 9 ML is a 
value of 9 that maximizes the likelihood function 

L{x v x 2 , ■■■,x n ; 9). 

A maximum likelihood estimator (MLE) of the parameter 9, shown by &ml ' s a random variable 
®ML = ®ml( x l’ X 2> '"’ X /,) whose value when X l =x l ,X 2 = x 2 , —,X n = x n is given by 9 ML . 

Example 8.10 

Lor the following examples, find the maximum likelihood estimator (MLE) of 9\ 

1. X [ ~ Binomial(m, 9), and we have observed X,,X 7 , X 3 , .. ., X n . 

2. X- ~ Exponential (9) and we have observed , X,, X 3 , . .. , X f/ . 

• Solution 

o 1. Similar to our calculation in Example 8.8.. for the observed values ofXj = jc ls 
X, = x 2 , •••, X = x fi , the likelihood function is given by 
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L(x x ,x 2 , -,x„; 9) =f X v-x ( x \’ x 2 > -> x n> 0 ) 

1 z n 
n 

- n fxf.H <» 

i=l 
n 

= nrw-^ 

/=i 

(9 iJ-1^,(1 - gynn - YUx t _ 

Note that the first term does not depend on 9, so we may write L(x ,, x 2 , 0) as 

L(x lt x 2 ,-,x„;0) = c 9 s {\-9) mn ~ s , 

where c does not depend on 9, and s = Y.k= \ x v differentiating and setting the 
derivative to 0 we obtain 



®ml - Tj x i- 


mn 


k= l 


This suggests that the MLE can be written as 


®ml = — I>f 
mn , , 


k= 1 


2. Similar to our calculation in Example 8.8. . for the observed values ofXj = jtj, 
= jc 2 , the likelihood function is given by 


n 

L(x X ,x 2 , -,x n ; 9) = PI f x (x-, Q) 

i= I ' 
n 

= Y[9e~ ex i 

i= 1 


Therefore, 


n 

\aL{x^,x 2 , •■■,x n ; 9) = n\n9 - J'.Xjd. 

k= 1 
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By differentiating and setting the derivative to 0 we obtain 



n 


I 


n 

k=\ 


x i 


This suggests that the MLE can be written as 


®ML 


-‘k= 1 A i 


The examples that we have discussed had only one unknown parameter 8. In general, 6 could be a 
vector of parameters, and we can apply the same methodology to obtain the MLE. More specifically, 
if we have k unknown parameters 8 X , d 2 , •••, 8^, then we need to maximize the likelihood function 

L(x l ,x 2 , —,x n ; <9j, 8 2 , 8 k ) 

to obtain the maximum likelihood estimators © 1? © 9 , •••, 0^. Let's look at an example. 


Example 8.11 

Suppose that we have observed the random sample X x , X 2 , X 2 ,..., X n , where X- ~ N( 8 1 , 0 9 ), so 


fxf x i’ &V @2) 


1 (x,-e 1) 2 



Find the maximum likelihood estimators for 8 j and d 2 , 
• Solution 

o The likelihood function is given by 


L(x l ,x 2 , —,x n ; 8 X , d 2 ) = 


1 


ex P -xxY,( x r d 1) 2 


2^2 r-i 


(2tc) 2 8 1 2 

Here again, it is easier to work with the log likelihood function 


\nL{x x ,x 2 , -,x n ; 8 V d 2 ) = ~ ~\n{2n) - -In d 2 ~ —X ( x / “ e \ f- 
We take the derivatives with respect to 8 X and O', and set them to zero: 
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5 

—\nL(x l ,x 2 ,-,x n ;e l ,e 2 ) 


1 n 

1) = ° 

U 2i=\ 


d 

—\nL(x l ,x 2 ,-,x n ;6 l ,e 2 ) 


ft *■ 9 


20 . 


26 


2 1 ~ 


By solving the above equations, we obtain the following maximum likelihood estimates 
for and 6 1 : 


h 


n 





We can write the MLE of 6 1 and 9 2 as random variables 0 1 and 0 1 : 




n r , 

i = l 


®2 = ;Z(^-®i) 2 - 


n 1 . 

i = l 


Note that © | is the sample mean, X, and therefore it is an unbiased estimator of the mean. 
Here, ©-, is very close to the sample variance which we defined as 




In fact, 


n 

Since we already know that the sample variance of unbiased estimator of the variance, we 
conclude that 0 9 is a biased estimator of the variance: 

„ n ~ 1 

e® 2 =- e 2 . 

n 

Nevertheless, the bias is very small here and it goes to zero as n gets large. 
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Note: Here, we caution that we cannot always find the maximum likelihood estimator by setting the 
derivative to zero. For example, if 6 is an integer-valued parameter (such as the number of blue balls 
in Example 8.9.1 then we cannot use differentiation and we need to find the maximizing value in 
another way. Even if 6 is a real-valued parameter, we cannot always find the MEE by setting the 
derivative to zero. For example, the maximum might be obtained at the endpoints of the acceptable 
ranges. We will see an example of such scenarios in the Solved Problems section ( Section 8.2.51 . 
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